misinformation detection
Reasoning About the Unsaid: Misinformation Detection with Omission-Aware Graph Inference
Wang, Zhengjia, Wang, Danding, Sheng, Qiang, Wu, Jiaying, Cao, Juan
This paper investigates the detection of misinformation, which deceives readers by explicitly fabricating misleading content or implicitly omitting important information necessary for informed judgment. While the former has been extensively studied, omission-based deception remains largely overlooked, even though it can subtly guide readers toward false conclusions under the illusion of completeness. To pioneer in this direction, this paper presents OmiGraph, the first omission-aware framework for misinformation detection. Specifically, OmiGraph constructs an omission-aware graph for the target news by utilizing a contextual environment that captures complementary perspectives of the same event, thereby surfacing potentially omitted contents. Based on this graph, omission-oriented relation modeling is then proposed to identify the internal contextual dependencies, as well as the dynamic omission intents, formulating a comprehensive omission relation representation. Finally, to extract omission patterns for detection, OmiGraph introduces omission-aware message-passing and aggregation that establishes holistic deception perception by integrating the omission contents and relations. Experiments show that, by considering the omission perspective, our approach attains remarkable performance, achieving average improvements of +5.4% F1 and +5.3% ACC on two large-scale benchmarks.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Media > News (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Insight-A: Attribution-aware for Multimodal Misinformation Detection
Wu, Junjie, Fu, Yumeng, Gong, Chen, Fu, Guohong
AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. However, standard prompting leverages multimodal large language models (MLLMs) to identify the emerging misinformation, which ignores the misinformation attribution. To this end, we present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation. Insight-A makes two efforts: I) attribute misinformation to forgery sources, and II) an effective pipeline with hierarchical reasoning that detects distortions across modalities. Specifically, to attribute misinformation to forgery traces based on generation patterns, we devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning. Meanwhile, to reduce the subjectivity of human-annotated prompts, automatic attribution-debiased prompting (ADP) is used for task adaptation on MLLMs. Additionally, we design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking. Extensive experiments demonstrate the superiority of our proposal and provide a new paradigm for multimodal misinformation detection in the era of AIGC.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (6 more...)
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
Zhang, Zhihao, Zhang, Yiran, Zhou, Xiyue, Huang, Liting, Razzak, Imran, Nakov, Preslav, Naseem, Usman
Infodemics and health misinformation have significant negative impact on individuals and society, exacerbating confusion and increasing hesitancy in adopting recommended health measures. Recent advancements in generative AI, capable of producing realistic, human like text and images, have significantly accelerated the spread and expanded the reach of health misinformation, resulting in an alarming surge in its dissemination. To combat the infodemics, most existing work has focused on developing misinformation datasets from social media and fact checking platforms, but has faced limitations in topical coverage, inclusion of AI generation, and accessibility of raw content. To address these issues, we present MM Health, a large scale multimodal misinformation dataset in the health domain consisting of 34,746 news article encompassing both textual and visual information. MM Health includes human-generated multimodal information (5,776 articles) and AI generated multimodal information (28,880 articles) from various SOTA generative AI models. Additionally, We benchmarked our dataset against three tasks (reliability checks, originality checks, and fine-grained AI detection) demonstrating that existing SOTA models struggle to accurately distinguish the reliability and origin of information. Our dataset aims to support the development of misinformation detection across various health scenarios, facilitating the detection of human and machine generated content at multimodal levels.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Utah > Summit County > Park City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.95)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.95)
- Government > Regional Government > North America Government > United States Government (0.68)
HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection
Wu, Junjie, Fu, Yumeng, Yu, Nan, Fu, Guohong
Recent advancements in multimodal out-of-context (OOC) misinformation detection have made remarkable progress in checking the consistencies between different modalities for supporting or refuting image-text pairs. However, existing OOC misinformation detection methods tend to emphasize the role of internal consistency, ignoring the significant of external consistency between image-text pairs and external evidence. In this paper, we propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs). Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval. Evidence reranking module utilizes Automatic Evidence Selection Prompting (AESP) that acquires the relevant evidence item from the products of evidence retrieval. Subsequently, evidence rewriting module leverages Automatic Evidence Generation Prompting (AEGP) to improve task adaptation on MLLM-based OOC misinformation detectors. Furthermore, our approach enables explanation for judgment, and achieves impressive performance with instruction tuning. Experimental results on different benchmark datasets demonstrate that our proposed HiEAG surpasses previous state-of-the-art (SOTA) methods in the accuracy over all samples.
- North America > United States (0.14)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
FinVet: A Collaborative Framework of RAG and External Fact-Checking Agents for Financial Misinformation Detection
Araya, Daniel Berhane, Liao, Duoduo
Financial markets face growing threats from misinformation that can trigger billions in losses in minutes. Most existing approaches lack transparency in their decision-making and provide limited attribution to credible sources. We introduce FinVet, a novel multi-agent framework that integrates two Retrieval-Augmented Generation (RAG) pipelines with external fact-checking through a confidence-weighted voting mechanism. FinVet employs adaptive three-tier processing that dynamically adjusts verification strategies based on retrieval confidence, from direct metadata extraction to hybrid reasoning to full model-based analysis. Unlike existing methods, FinVet provides evidence-backed verdicts, source attribution, confidence scores, and explicit uncertainty flags when evidence is insufficient. Experimental evaluation on the FinFact dataset shows that FinVet achieves an F1 score of 0.85, which is a 10.4% improvement over the best individual pipeline (fact-check pipeline) and 37% improvement over standalone RAG approaches.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- (3 more...)
- Media > News (1.00)
- Law (1.00)
- Government (1.00)
- Banking & Finance > Trading (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search
Cui, Xing, Zou, Yueying, Li, Zekun, Li, Peipei, Xu, Xinyuan, Liu, Xuannan, Huang, Huaibo
Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose \method, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, \method~ extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that \method~ consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free detector.
- North America > United States > Wisconsin (0.05)
- North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
- North America > United States > Florida > Hillsborough County > Tampa (0.04)
- (2 more...)
- Media > News (1.00)
- Information Technology (1.00)
- Government > Regional Government (1.00)
- (2 more...)
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Essahli, Soufiane, Sarsar, Oussama, Bentajer, Ahmed, Motii, Anas, Fouad, Imane
Social platforms distribute information at unprecedented speed, which in turn accelerates the spread of misinformation and threatens public discourse. We present FakeZero, a fully client-side, cross-platform browser extension that flags unreliable posts on Facebook and X (formerly Twitter) while the user scrolls. All computation, DOM scraping, tokenization, Transformer inference, and UI rendering run locally through the Chromium messaging API, so no personal data leaves the device. FakeZero employs a three-stage training curriculum: baseline fine-tuning and domain-adaptive training enhanced with focal loss, adversarial augmentation, and post-training quantization. Evaluated on a dataset of 239,000 posts, the DistilBERT-Quant model (67.6 MB) reaches 97.1% macro-F1, 97.4% accuracy, and an AUROC of 0.996, with a median latency of approximately 103 ms on a commodity laptop. A memory-efficient TinyBERT-Quant variant retains 95.7% macro-F1 and 96.1% accuracy while shrinking the model to 14.7 MB and lowering latency to approximately 40 ms, showing that high-quality fake-news detection is feasible under tight resource budgets with only modest performance loss. By providing inline credibility cues, the extension can serve as a valuable tool for policymakers seeking to curb the spread of misinformation across social networks. With user consent, FakeZero also opens the door for researchers to collect large-scale datasets of fake news in the wild, enabling deeper analysis and the development of more robust detection techniques.
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Singapore (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (5 more...)
- Research Report (0.83)
- Instructional Material > Course Syllabus & Notes (0.48)
- Media > News (1.00)
- Information Technology (1.00)
Misinformation Detection using Large Language Models with Explainability
Patel, Jainee, Bhatt, Chintan, Trivedi, Himani, Nguyen, Thanh Thi
The COVID Fake News dataset is a collection of mostly COVID-19 pandemic-specific news headlines and brief claims. The data is representative of the combination of proven factual statements and much misleading or outright false information widespread on digital platforms during the pandemic. The data set was then preprocessed and split into training (8,160 samples) and testing (2,041 samples) categories in a balanced portion so that both real and fake labels could be checked robustly. The dataset used to check whether the pipeline can be applied to other domains rather than the pandemic area is the FakeNewsNet GossipCop. This dataset lies in the domain of entertainment and celebrity news and it is one of the prominent areas where gossip, rumors, fabricated stories are prevalent. Approximately 10,000 samples were used to train, and 2,500 samples were used to test. In the present dataset, the labels distinguish the news objects as Real or Fake by fact-checking them with regards to the original GossipCop platform. The two datasets were combined, standardized, and stratified to ensure the balanced classes in the samples during training and validation. Such prudent training has the benefit of enabling these models to improve in identifying subtle signs in language that may be contained in actual and made-up claims that can be used in enhancing the pipeline to perform better in practical misinformation detection applications.
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.49)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Zhang, Fanrui, Li, Dian, Zhang, Qiang, Chen, Jun, Liu, Gang, Lin, Junxiong, Yan, Jiahong, Liu, Jiawei, Zha, Zheng-Jun
The rapid spread of multimodal misinformation on social media has raised growing concerns, while research on video misinformation detection remains limited due to the lack of large-scale, diverse datasets. Existing methods often overfit to rigid templates and lack deep reasoning over deceptive content. To address these challenges, we introduce FakeVV, a large-scale benchmark comprising over 100,000 video-text pairs with fine-grained, interpretable annotations. In addition, we further propose Fact-R1, a novel framework that integrates deep reasoning with collaborative rule-based reinforcement learning. Fact-R1 is trained through a three-stage process: (1) misinformation long-Chain-of-Thought (CoT) instruction tuning, (2) preference alignment via Direct Preference Optimization (DPO), and (3) Group Relative Policy Optimization (GRPO) using a novel verifiable reward function. This enables Fact-R1 to exhibit emergent reasoning behaviors comparable to those observed in advanced text-based reinforcement learning systems, but in the more complex multimodal misinformation setting. Our work establishes a new paradigm for misinformation detection, bridging large-scale video understanding, reasoning-guided alignment, and interpretable verification.
- Indian Ocean > Red Sea (0.04)
- Asia > Middle East > Yemen (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- (12 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos
Giedemann, Patrick, von Däniken, Pius, Deriu, Jan, Rodrigo, Alvaro, Peñas, Anselmo, Cieliebak, Mark
The growing influence of video content as a medium for communication and misinformation underscores the urgent need for effective tools to analyze claims in multilingual and multi-topic settings. Existing efforts in misinformation detection largely focus on written text, leaving a significant gap in addressing the complexity of spoken text in video transcripts. We introduce ViClaim, a dataset of 1,798 annotated video transcripts across three languages (English, German, Spanish) and six topics. Each sentence in the transcripts is labeled with three claim-related categories: fact-check-worthy, fact-non-check-worthy, or opinion. We developed a custom annotation tool to facilitate the highly complex annotation process. Experiments with state-of-the-art multilingual language models demonstrate strong performance in cross-validation (macro F1 up to 0.896) but reveal challenges in generalization to unseen topics, particularly for distinct domains. Our findings highlight the complexity of claim detection in video transcripts. ViClaim offers a robust foundation for advancing misinformation detection in video-based communication, addressing a critical gap in multimodal analysis.
- Europe > United Kingdom (0.14)
- Europe > Ukraine (0.05)
- Europe > Russia (0.04)
- (12 more...)
- Media > News (0.91)
- Government > Voting & Elections (0.68)
- Government > Regional Government > North America Government > United States Government (0.47)